Word Alignment of English-Chinese Bilingual Corpus Based on Chucks

نویسندگان

  • Le Sun
  • Youbing Jin
  • Lin Du
  • Yufang Sun
چکیده

In this paper, a method for the word alignment of English-Chinese corpus based on chunks is proposed. The chunks of English sentences are identified firstly. Then the chunk boundaries of Chinese sentences are predicted by the translations of English chunks and heuristic information. The ambiguities of Chinese chunk boundaries are resolved by the coterminous words in English chunks. With the chunk aligned bilingual corpus, a translation relation probability is proposed to align words. Finally, we evaluate our system by real corpus and present the experiment results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction

This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually trans...

متن کامل

Collocation Extraction Using Monolingual Word Alignment Method

Statistical bilingual word alignment has been well studied in the context of machine translation. This paper adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations from monolingual corpus. The monolingual corpus is first replicated to generate a parallel corpus, where each sentence pair consists of two identical sentences in the same language. Then the mon...

متن کامل

Block Analysis of Bilingual Corpus for Chinese-English Statistical Machine Translation

In this paper, we describe a bilingual corpus processing strategy, block analysis, from a new point of view. By this analysis strategy, we want to extract more information from bilingual corpus for future statistical machine translation. At first, we define some block types and give some statistical data from a Chinese-English bilingual corpus under this framework. Then a block-based alignment ...

متن کامل

BIA: a Discriminative Phrase Alignment Toolkit

In most statistical machine translation systems, bilingual segments are extracted via word alignment. However, word alignment is performed independently from the requirements of the machine translation task. Furthermore, although phrase-based translation models have replacedword-based translationmodels nearly ten years ago, word-basedmodels are still widely used for word alignment. In this pape...

متن کامل

A Class-based Approach to Word Alignment

This paper presents an algorithm capable of identifying the translation for each word in a bilingual corpus. Previously proposed methods rely heavily on word-based statistics. Under a word-based approach, frequent words with a consistent translation can be aligned at a high rate of precision. However, words that are less frequent or exhibit diverse translations generally do not have statistical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000